Information Integration and Knowledge Acquisition from Semantically Heterogeneous Biological Data Sources
نویسندگان
چکیده
We present INDUS (Intelligent Data Understanding System), a federated, query-centric system for knowledge acquisition from autonomous, distributed, semantically heterogeneous data sources that can be viewed (conceptually) as tables. INDUS employs ontologies and inter-ontology mappings, to enable a user or an application to view a collection of such data sources (regardless of location, internal structure and query interfaces) as though they were a collection of tables structured according to an ontology supplied by the user. This allows INDUS to answer user queries against distributed, semantically heterogeneous data sources without the need for a centralized data warehouse or a common global ontology. We used INDUS framework to design algorithms for learning probabilistic models (e.g., Naive Bayes models) for predicting GO functional classification of a protein based on training sequences that are distributed among SWISSPROT and MIPS data sources. Mappings such as EC2GO and MIPS2GO were used to resolve the semantic differences between these data sources when answering queries posed by the learning algorithms. Our results show that INDUS can be successfully used for integrative analysis of data from multiple sources needed for collaborative discovery in computational biology.
منابع مشابه
Algorithms and Software for Collaborative Discovery from Autonomous, Semantically Heterogeneous, Distributed Information Sources
Development of high throughput data acquisition technologies, together with advances in computing, and communications have resulted in an explosive growth in the number, size, and diversity of potentially useful information sources. This has resulted in unprecedented opportunities in data-driven knowledge acquisition and decisionmaking in a number of emerging increasingly data-rich application ...
متن کاملOntology Design Patterns for Large-Scale Data Interchange and Discovery
Data and information integration remains a major challenge for our modern information-driven society whereby people and organizations often have to deal with large data volumes coming from semantically heterogeneous sources featuring significant variety between them. In this context, data integration aims to provide a unified view over data residing at different sources through a global schema,...
متن کاملKnowledge Acquisition from Distributed, Autonomous, Semantically Heterogeneous Data and Knowledge Sources (KADASH)
ion. For example, the program of study a student in a data source can be specified as Graduate Program (higher level of abstraction), while the program of study of a different student in the same data source (or even a different data source) can be specified as Doctoral Program (lower level of abstraction). 2005 IEEE ICDM Workshop on KADASH 5 The workshop brings together researchers in relevant...
متن کاملKnowledge Acquisition from Semantically Heterogeneous Data
Recent advances in sensors, digital storage, computing and communications technologies have led to a proliferation of autonomously operated, geographically distributed data repositories in virtually every area of human endeavor, including e-business and e-commerce, e-science, e-government, security informatics, etc. Effective use of such data in practice (e.g., building useful predictive models...
متن کاملA Methodology for Terminology-based Knowledge Acquisition and Integration
In this paper we propose an integrated knowledge management system in which terminology-based knowledge acquisition, knowledge integration, and XML-based knowledge retrieval are combined using tag information and ontology management tools. The main objective of the system is to facilitate knowledge acquisition through query answering against XML-based documents in the domain of molecular biolog...
متن کامل